Parallel Symbolic Factorization for Sparse LU Factorization with Static Pivoting
نویسندگان
چکیده
In this paper we consider a direct method to solve a sparse unsymmetric system of linear equations Ax = b, which is the Gaussian elimination. This elimination consists in explicitly factoring the matrix A into the product of L and U , where L is a unit lower triangular matrix, and U is an upper triangular matrix, followed by solving LUx = b one factor at a time. One of the main characteristics of the sparse LU factorization is the notion of fill-in. This notion denotes an element that was zero in the original matrix A, but becomes nonzero during the factorization. As these fill-ins can be computed without referring to the numerical values of the matrix, and to be able to allocate memory and to organize computations before calculating the numerical values of the factors, the resolution of a sparse system is divided into several phases. We present here the specific phases of SuperLU DIST [4], a widely used algorithm to solve large sparse unsymmetric systems on distributed memory computers. The first step consists in choosing a permutation matrix P1 and diagonal matrices D1 and D2 so that P1D1AD2 has large entries on the diagonal. This helps assure accuracy of the final solution. The second step orders equations and variables by choosing a permutation matrix P2 so that the factors L and U of P T 2 P1D1AD2P2 are as sparse as possible. The third step performs a symbolic analysis, that is it identifies the locations of nonzero entries of L and U . And finally, the fourth step computes the numerical values of the factors L and U . We discuss the design and the implementation of a memory scalable symbolic factorization algorithm for unsymmetric matrices on distributed memory machines. Its integration in SuperLU DIST will transform this solver into a fully parallel solver. Earlier work has addressed the parallelization of the numerical factorization (step 4 in SuperLU), because its complexity is generally of higher order compared to the other steps. This is now a well understood problem and the algorithm implemented in SuperLU proved to be highly parallel and efficient. Techniques were proposed for computing fill-reducing ordering in parallel (step 2 in SuperLU), and we will review them briefly later in this section. More recent research focused on the development of efficient parallel algorithms for permuting large entries on the diagonal (step 1 in SuperLU). All these algorithms use distributed data structures, and in particular the input matrix A is distributed over the processors. They offer an overall good scalability. This includes memory scalability, i.e. if both the problem size and the number of processors is increased by the same factor, then the same amount of memory is used per processor.
منابع مشابه
S+: Efficient 2D Sparse LU Factorization on Parallel Machines
Static symbolic factorization coupled with supernode partitioning and asynchronous computation scheduling can achieve high giga op rates for parallel sparse LU factorization with partial pivoting This paper studies properties of elimination forests and uses them to optimize supernode partitioning amalgamation and execution scheduling It also proposes supernodal matrix multiplication to speed up...
متن کاملA Comparison of D and D Data Mapping for Sparse LU Factorization with Partial Pivoting
This paper presents a comparative study of two data mapping schemes for parallel sparse LU factorization with partial pivoting on distributed memory machines Our previous work has developed an approach that incorporates static symbolic factoriza tion nonsymmetric L U supernode partitioning and graph scheduling for this problem with D column block mapping The D mapping is commonly considered mor...
متن کاملParallel Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures
Gaussian elimination based sparse LU factorization with partial pivoting is important to many scientiic applications, but it is still an open problem to develop a high performance sparse LU code on distributed memory machines. The main diiculty is that partial pivoting operations make structures of L and U factors unpredictable beforehand. This paper presents an approach called S for paralleliz...
متن کاملEfficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures
A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations dynamically change computation and nonzero fill-in structures during the elimination process. This paper p...
متن کاملEecient Sparse Lu Factorization with Partial Pivoting on Distributed Memory Architectures
A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientiic applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main diiculty is that partial pivoting operations dynamically change computation and nonzero ll-in structures during the elimination process. This paper presen...
متن کاملA Comparison of 1-D and 2-D Data Mapping for Sparse LU Factorization with Partial Pivoting
This paper presents a comparative study of two data mapping schemes for parallel sparse LU factorization with partial pivoting on distributed memory machines. Our previous work has developed an approach that incorporates static symbolic factoriza-tion, nonsymmetric L/U supernode partitioning and graph scheduling for this problem with 1-D column-block mapping. The 2-D mapping is commonly conside...
متن کامل